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Password Cracking: Basics 


System stores hash of user's password 

Authentication works by hashing user input 
and comparing result with stored hash 




stored 

Hash 



Password Cracking works exactly the same 

— Try different passwords and eventually you'll find 
the correct one 



Password Cracking: Trends 

Changes in past five years: 

• «Salting» used to avoid pre-computations 

• (Much) stronger cryptography is used 

• Password cracking became much slower 

What can be done? 

• Continue research 

• Increase password recovery speed 



ELCOMSOFT 

PROACTIVE SOFTVI/ARE 


Password Recovery: How to do it Faster? 


Softwa re 
Optimization 




Special 

Hardware 




• Hardware already installed in many computers 

• Cost-effective 

• Compatible with software from different vendors 

• Can be used for other applications 




Common 

Hardware 




• Free for End-Users 

• Limited speedup (10-20%) 

• Need to re-optimize for every new CPU 

• Expensive devices 

• Won't work with software from other vendors 
•Not very cost-effective 
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GPU: Two Worlds 



NVIDIA 

ATI 

Name 

CUDA 

CAL 

First public beta 

Feb 2007 

Mar 2008 

Minimum hardware 

GeForce 8 

HD 2X00 

High-level language 

C/C++ (NVCC) 

C/C++ 

(Brook+) 

Intermediate language 

PTX 

IL 

Low-level language 

No 

For R600 only 




CUDA Basics 


• GPU is a highly multithreaded data-parallel 
coprocessor 

- Up to 128 processors (16 multiprocessors) 

• Fast on-board RAM 

- Up to 70 GiB/sec throughput 

• Completely different programming model 

— Write your program from scratch rather than 
porting serial implementation 


Task Partitioning 

Function compiled for GPU is called kernel 
Kernel runs as a grid of thread blocks 

— Block may be ID, 2D, or 3D 

— Grid may be ID or 2D 

Threads can communicate within block 

No grid-level communication or 
synchronization 


Up to 512 threads 
in one block 


Grid sizes up to 
65535x65535 


Hardware allows 
up to 2^^ threads 


Kernel 1 



Thread 
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Thread 
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Thread 
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Thread 
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Thread 
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Thread 
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Thread 
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Thread 
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Memory Model 


Multiprocessor-level 

Type 

Scope 

Access 

Speed 

Registers 

Thread 

R/W 

Very Fast 

Shared memory 

Block 

R/W 

Very Fast 


Device-level 


Type 

Scope 

Access 

Speed 

Global memory 

Grid 

R/W 

Slow 

Local memory 

Thread 

R/W 

Slow 

Constant memory 

Grid 

RO 

Fast, cached 

Texture memory 

Grid 

RO 

Fast, cached 














• 8192 32-bit registers 
per MP 

• 16Kb shared memory 
per MP 

• 64Kb constant 
memory per grid 

• 8Kb constant cache 
per MP 

• 8Kb texture cache 
per MP 


Device 

Multiprocessor N 


Multiprocessor 2 
Multiprocessor 1 






Password Cracking on CPU 


Very basic password cracker is simple: 


while ( 1 ) { 

password = get_next_pas sword () ; 
hash = calculate_hash (password) ; 
if ( is_correct (hash) ) 

{ 

print "Password found:" + password; 
break; 


} 


} 


/ PWD 7 

H 

I 


/ Hash / 



/ stored Hash / 


Easy to parallelize! 



Password Cracking on CPUs 

Very basic parallel password cracking thread: 


while ( not_found ) { 

mutex_lock ( ) ; 

password = get_next_pas sword ( ) ; 

mutex_\anlock ( ) ; 

hash = calculate_hash (password) ; 
if ( is_correct (hash) ) 

{ 

print "Password found:" + password; 

not_found = true; 

break; 


} 


} 


Spawn as many threads as CPUs/cores you have 
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Password Cracking on GPU 


Fits well to CUDA programming model 
/V threads check N password in parallel 
No inter-thread communications needed 






Tutorial: MD5 cracker on GPU 


What /-th password cracking thread must do? 

1. Generate (start+i)-th password 

2. MD5(password) 

3. If hash is correct, return / to the host 
Notes: 

1. Host doesn't need to do MD5 at all! 

2. Dictionary requires more host to GPU transfers 
and thus less attractive. We'll do bruteforce. 



Tutorial: MD5 cracker on GPU 

GPU kernel environment: 


constant 

unsigned int nPassword [ 32 ] ; 


constant 

unsigned int nPasswordLen; 


constant 

unsigned int cCharset [256] ; 


constant 

unsigned int nCharsetLen; 


constant 

unsigned long bHash[4]; 


device 

void MDSTransf orm ( unsigned 

long ’^s. 


unsigned 

long ’^d ) ; 

global 

void MD5 Brute GPU ( unsigned long *pdwResult ) ; 



Tutorial: MD5 cracker on GPU 

Getting password to process: 

unsigned char Block [64] = { 0 }; 

unsigned int tid = blockidx.x ^ 256 + threadidx.x; 
unsigned int t, r, i, q = tid; 
for ( i = 0; i < nPasswordLen; i++ ) { 

t = q + nPassword [ i ] ; 
q = t/nCharsetLen; 
r = t - q^nCharsetLen; 

Block[i] = cCharset[r]; 

} 

/ / MD5 padding & length 
Block[ i] = 0x80; 

Block [56] = nPasswordLen ^ 8; 



Tutorial: MD5 cracker on GPU 

MD5 core (taken from RFC1321): 


device 

void 

MDSTransf orm ( 

unsigned 

long ’^state. 




unsigned 

long ) { 

unsigned 

long 

a = state [ 0 ] , 

b = state [ 1 ] 

• 

r 

unsigned 

long 

c = state [2] f 

d = state [ 3 ] 

• 

r 

/ Round 

1 */ 




FF (a, b, 

• • • 

Cf d 

, x[ 0] , Sll, 

0xd7 6aa478) ; 

/* 1 */ 

state [ 0 ] 

+= a; 




state [ 1 ] 

+= b; 




state [ 2 ] 

+= c; 




state [ 3 ] 

} 

+= d; 






Tutorial: MD5 cracker on GPU 

MD5 transform in kernel: 


MDSTransf orm ( State, (unsigned long’^) Block ); 


Hash compare: 


} 


if ( State[0] == bHash[0] ) 

if ( State[l] == bHash[l] ) 

if ( State[2] == bHash[2] ) 

if ( State[3] == bHash[3] ) 
’^pdwResult = tid; 



Tutorial: MD5 cracker on GPU 


Calling kernel from host: 

extern "C" void RunKernel_MD5 ( int grid, 

unsigned iong ’^pdwResuit ) { 

MD5 Brute GPU<<< grid, 256 >>> ( pdwResuit ); } 


CUDA initialization: 


int deviceCount = 0; 
cudaError rc; 

rc = cudaGetDeviceCount ( SdeviceGount ) ; 
if ( rc != cudaSuccess ) { 

printf ( " ! cudaGetDeviceGount ( ) faiied: 

cudaGetErrorString ( rc ) ) ; 

return 0; 


} 


%s\n" 


r 



Tutorial: MD5 cracker on GPU 

Allocate GPU memory: 

rc = cudaMalloc ( &pdResult, 4 ); 
if ( rc != cudaSuccess ) { 

printf ( " ! cudaMalloc () failed: %s\n", 

cudaGetErrorString ( rc ) ) ; 

return 0; 

} 

Copy data to GPU memory: 

rc = cudaMemcpy ( pdResult, &nResult, 4, 

cudaMemcpyHostToDevice ) ; 
if ( rc != cudaSuccess ) { 

printf ( " ! cudaMemcpy ( ) failed: %s\n'\ 

CudaGetErrorString ( rc ) ) ; 

return 0; 



Tutorial: MD5 cracker on GPU 

Copy data to GPU constant memory: 


rc = cudaMemcpyToSymbol ( "nPassword", nPassword, 32’^4 ); 
if ( rc != cudaSuccess ) 

{ 

printf ( " ! cudaMemcpyToSymbol ( ) failed: %s\n", 

cudaGetErrorString ( rc ) ) ; 

return 0; 

} 



Tutorial: MD5 cracker on GPU 

Main loop: 

while ( 1 ) 

{ 

RunKernel_MD5 ( 8192, (unsigned long’^ ) pdResult ); 
rc = cudaThreadSynchronize ( ) ; 
if ( rc != cudaSuccess ) { 

printf ( " ! cudaThreadSynchronize ( ) failed: %s\n", 

cudaGetErrorString ( rc ) ) ; 

break; 

} 

cudaMemcpy ( &nResult, pdResult, 4 , cudaMemcpyDeviceToHost ); 
// Increment password by 8192*256 

• • • 

} 
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Tutorial: MD5 cracker on GPU 

This unoptimized version runs at 

115M p/s 
on 8800 GTX 


lOx faster than dual-core Core2 @ 1.86 GHz 


Tutorial: MD5 cracker on GPU 


Possible optimizations: 

• Don't use integer division 

- Use multiplication instead 

• Unroll loop 

— This will move Block [ ] from local memory to 
(much) faster registers 


We've released optimized & free GPU MD5 

cracker! 
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GPU: Performance 


Password Recovery Speed 



■ 8800GTX 

■ 8600GTS 

■ Core2 Duo 


0 50 100 150 200 250 300 350 400 


Millions 
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